A Model for Varying Speaking Style in TTS systems
نویسندگان
چکیده
This paper aims to enhance the performance of a TTS system by generating various speaking styles. First we describe three speaking styles (Radio News, Political Address and Conversation) and compare the prosodic features found in these authentic styles with the prosody in “neutral” speech uttered by the eLite TTS system ([1]). Differences concern about 20 prosodic characteristics (F0 span, speech rate, pauses and hesitation, primary and secondary accentuation, schwa deletion, etc.). In order to make the neutral speech similar to a typical speaking style, prosodic characteristics are implemented within the TTS system itself or during a postprocessing step. The quality of the “stylized” synthesis is evaluated by comparing it to the original style.
منابع مشابه
Automatic prosodic modeling for speaker and task adaptation in text-to-speech
One of the most important demands for future TTS systems is their ability to improve naturalness when embedded in a particular task or application that requires a particular speaking style for a particular speaker. In this paper, we present a new prosodic modeling procedure for improving naturalness by adapting a TTS system to a new speaker and a new speaking style. The proposed procedure is an...
متن کاملA Corpus-based Approach to <ahem/> Expressive Speech Synthesis
Human speech communication can be thought of as comprising two channels – the words themselves, and the style in which they are spoken. Each of these channels carries information. Today's most-advanced text-to-speech (TTS) systems such as [1],[2],[3],[4] fall far short of human speech because they offer only a single, fixed style of delivery, independent of the message. In this paper, we descri...
متن کاملAdding speaking style to a TTS system
This paper aims to enhance the performance of a TTS system by generating various speaking styles. First we describe three speaking styles (Radio News, Political Address and Conversation) and compare the prosodic features found in these authentic styles with the prosody in “neutral” speech uttered by the eLite TTS system ([1]). Differences concern about 20 prosodic characteristics (F0 span, spee...
متن کاملHmm-based Expressive Speech Synthesis —towards Tts with Arbitrary Speaking Styles and Emotions
This paper describes recent progress in our approach to generating expressive speech. A goal of text-to-speech (TTS) synthesis is to have an ability to generate natural sounding speech with arbitrary speaker’s voice characteristics, speaking styles and emotional expressions. To change voice and speaking style and/or emotion of the synthetic speech arbitrarily with maintaining its naturalness, i...
متن کاملDevelopment of a genre-dependent TTS system with cross-speaker speaking-style transplantation
One of the biggest challenges in speech synthesis is the production of contextually-appropriate naturally sounding synthetic voices. This means that a Text-To-Speech system must be able to analyze a text beyond the sentence limits in order to select, or even modulate, the speaking style according to a broader context. Our current architecture is based on a two-step approach: text genre identifi...
متن کامل